Lancaster
Ensembling geophysical models with Bayesian Neural Networks
Ensembles of geophysical models improve prediction accuracy and express uncertainties. We develop a novel data-driven ensembling strategy for combining geophysical models using Bayesian Neural Networks, which infers spatiotem-porally varying model weights and bias, while accounting for heteroscedastic uncertainties in the observations. This produces more accurate and uncertainty-aware predictions without sacrificing interpretability.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > Saint Martin (0.04)
- Europe > United Kingdom > England > Lancashire > Lancaster (0.04)
- (5 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.68)
AfriStereo: A Culturally Grounded Dataset for Evaluating Stereotypical Bias in Large Language Models
Beux, Yann Le, Audu, Oluchi, Ankeli, Oche D., Balakrishnan, Dhananjay, Weya, Melissah, Ralaiarinosy, Marie D., Ezeani, Ignatius
Existing AI bias evaluation benchmarks largely reflect Western perspectives, leaving African contexts underrepresented and enabling harmful stereotypes in applications across various domains. To address this gap, we introduce AfriStereo, the first open-source African stereotype dataset and evaluation framework grounded in local socio-cultural contexts. Through community engaged efforts across Senegal, Kenya, and Nigeria, we collected 1,163 stereotypes spanning gender, ethnicity, religion, age, and profession. Using few-shot prompting with human-in-the-loop validation, we augmented the dataset to over 5,000 stereotype-antistereotype pairs. Entries were validated through semantic clustering and manual annotation by culturally informed reviewers. Preliminary evaluation of language models reveals that nine of eleven models exhibit statistically significant bias, with Bias Preference Ratios (BPR) ranging from 0.63 to 0.78 (p <= 0.05), indicating systematic preferences for stereotypes over antistereotypes, particularly across age, profession, and gender dimensions. Domain-specific models appeared to show weaker bias in our setup, suggesting task-specific training may mitigate some associations. Looking ahead, AfriStereo opens pathways for future research on culturally grounded bias evaluation and mitigation, offering key methodologies for the AI community on building more equitable, context-aware, and globally inclusive NLP technologies.
- Research Report > Experimental Study (0.66)
- Research Report > New Finding (0.48)
Mitigating Semantic Drift: Evaluating LLMs' Efficacy in Psychotherapy through MI Dialogue Summarization
Kumar, Vivek, Rajawat, Pushpraj Singh, Ntoutsi, Eirini
Recent advancements in large language models (LLMs) have shown their potential across both general and domain-specific tasks. However, there is a growing concern regarding their lack of sensitivity, factual incorrectness in responses, inconsistent expressions of empathy, bias, hallucinations, and overall inability to capture the depth and complexity of human understanding, especially in low-resource and sensitive domains such as psychology. To address these challenges, our study employs a mixed-methods approach to evaluate the efficacy of LLMs in psychotherapy. We use LLMs to generate precise summaries of motivational interviewing (MI) dialogues and design a two-stage annotation scheme based on key components of the Motivational Interviewing Treatment Integrity (MITI) framework, namely evocation, collaboration, autonomy, direction, empathy, and a non-judgmental attitude. Using expert-annotated MI dialogues as ground truth, we formulate multi-class classification tasks to assess model performance under progressive prompting techniques, incorporating one-shot and few-shot prompting. Our results offer insights into LLMs' capacity for understanding complex psychological constructs and highlight best practices to mitigate ``semantic drift" in therapeutic settings. Our work contributes not only to the MI community by providing a high-quality annotated dataset to address data scarcity in low-resource domains but also critical insights for using LLMs for precise contextual interpretation in complex behavioral therapy.
AraFinNews: Arabic Financial Summarisation with Domain-Adapted LLMs
We introduce AraFinNews, the largest publicly available Arabic financial news dataset to date, comprising 212,500 article-headline pairs spanning a decade of reporting from 2015 to 2025. Designed as an Arabic counterpart to major English summarisation corpora such as CNN/DailyMail, AraFinNews provides a realistic benchmark for evaluating domain-specific language understanding and generation in financial contexts. Using this resource, we investigate the impact of domain specificity on abstractive summarisation of Arabic financial texts with large language models (LLMs). In particular, we evaluate transformer-based models: mT5, AraT5, and the domain-adapted FinAraT5 to examine how financial-domain pretraining influences accuracy, numerical reliability, and stylistic alignment with professional reporting. Experimental results show that domain-adapted models generate more coherent summaries, especially in their handling of quantitative and entity-centric information. These findings highlight the importance of domain-specific adaptation for improving narrative fluency in Arabic financial summarisation. The dataset is freely available for non-commercial research at https://github.com/ArabicNLP-uk/AraFinNews.
- Europe > United Kingdom > England > Lancashire > Lancaster (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- North America > Dominican Republic (0.04)
- (7 more...)
- Europe > United Kingdom > England > Lancashire > Lancaster (0.04)
- Europe > United Kingdom > Scotland > City of Glasgow > Glasgow (0.04)
- Europe > Spain (0.04)
- Asia > India > NCT > Delhi (0.04)
- Research Report > Experimental Study (0.67)
- Research Report > New Finding (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Lancashire > Lancaster (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Asia > Japan > Kyūshū & Okinawa > Kyūshū > Fukuoka Prefecture > Fukuoka (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Lancashire > Lancaster (0.04)
- Asia > Middle East > Jordan (0.04)
Why do zeroes happen? A model-based approach for demand classification
Svetunkov, Ivan, Sroginis, Anna
Effective demand forecasting is critical for inventory management, production planning, and decision making across industries. Selecting the appropriate model and suitable features to efficiently capture patterns in the data is one of the main challenges in demand forecasting. In reality, this becomes even more complicated when the recorded sales have zeroes, which can happen naturally or due to some anomalies, such as stockouts and recording errors. Mistreating the zeroes can lead to the application of inappropriate forecasting methods, and thus leading to poor decision making. Furthermore, the demand itself can have different fundamental characteristics, and being able to distinguish one type from another might bring substantial benefits in terms of accuracy and thus decision making. We propose a two-stage model-based classification framework that in the first step, identifies artificially occurring zeroes, and in the second, classifies demand to one of the possible types: regular/intermittent, intermittent smooth/lumpy, fractional/count. The framework relies on statistical modelling and information criteria. We argue that different types of demand need different features, and show empirically that they tend to increase the accuracy of the forecasting methods and reduce inventory costs compared to those applied directly to the dataset without the generated features and the two-stage framework.
- Europe > Austria > Vienna (0.14)
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
- Europe > United Kingdom > England > Lancashire > Lancaster (0.04)
Evaluating Open-Source Vision-Language Models for Multimodal Sarcasm Detection
Basnet, Saroj, Farabi, Shafkat, Ranasinghe, Tharindu, Kanoji, Diptesh, Zampieri, Marcos
In this work, we evaluate seven state-of-the-art VLMs - BLIP2, InstructBLIP, OpenFlamingo, LLaV A, PaliGemma, Gemma3, and Qwen-VL - on their ability to detect multimodal sarcasm using zero-, one-, and few-shot prompting. Furthermore, we evaluate the models' capabilities in generating explanations to sarcastic instances. We evaluate the capabilities of VLMs on three benchmark sarcasm datasets (Muse, MMSD2.0, and SarcNet). Our primary objectives are twofold: (1) to quantify each model's performance in detecting sarcastic image-caption pairs, and (2) to assess their ability to generate human-quality explanations that highlight the visual-textual incongruities driving sarcasm. Our results indicate that, while current models achieve moderate success in binary sarcasm detection, they are still not able to generate high-quality explanations without task-specific fine-tuning.
- North America > United States > Virginia (0.04)
- Europe > United Kingdom > England > Surrey > Guildford (0.04)
- Europe > United Kingdom > England > Lancashire > Lancaster (0.04)
- Overview (0.93)
- Research Report > New Finding (0.66)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.73)
ProtoMedX: Towards Explainable Multi-Modal Prototype Learning for Bone Health Classification
Pellicer, Alvaro Lopez, Mariucci, Andre, Angelov, Plamen, Bukhari, Marwan, Kerns, Jemma G.
Bone health studies are crucial in medical practice for the early detection and treatment of Osteopenia and Osteoporosis. Clinicians usually make a diagnosis based on densitometry (DEXA scans) and other patient history. The applications of AI in this field are an ongoing research. Most of the successful methods for this task include Deep Learning models that rely on vision alone (DEXA / X-ray imagery) geared towards high prediction accuracy, where ex-plainability is disregarded and largely based on the post hoc assessment of input contributions. W e propose ProtoMedX, a multi-modal model that uses both DEXA scans of the lumbar spine and patient records. ProtoMedX's prototype-based architecture is explainable by design, crucial for medical applications, especially in the context of the upcoming EU AI Act, as it allows explicit analysis of the model's decisions, especially the ones that are incorrect. ProtoMedX demonstrates state-of-the-art performance in bone health classification while also providing explanations that can be visually understood by clinicians. Using our dataset of 4,160 real NHS patients, the proposed ProtoMedX achieves 87.58% accuracy in vision-only tasks and 89.8% in its multi-modal variant, both approaches surpassing existing published methods.
- North America > United States (0.14)
- Asia > Pakistan (0.04)
- North America > Canada (0.04)
- (3 more...)
- Health & Medicine > Consumer Health (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (0.94)
- Education > Health & Safety > School Nutrition (0.93)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)